Chosen Plaintext Attack (CPA)

A chosen plaintext attack (CPA) models the scenario where an adversary can choose arbitrary plaintexts and obtain their corresponding ciphertexts that are all generated by encrypting the messages with the same secret key . The adversary's goal is to then decrypt a ciphertext that was obtained by encrypting an unknown message , also with the secret key .

In World War 2, the British would place mines at specific locations and when the Germans found them, they would encrypt their locations and send them to their superiors. The intercepted encrypted messages would later be used at Bletchley Park to break the encryption scheme of the Germans.

This scenario gives the adversary (partial) control over the messages and ciphertexts it has access to and one can imagine this as the attacker being able to influence to some extent the messages that are exchanged by the two authentic parties Alice and Bob.

It is imperative to remember that in the CPA model, all messages are encrypted using the same key.

CPA-Security

So what does it mean for an encryption scheme to be secure under the chosen plaintext thread model?

The efficient adversary $\textit{Eve}$ is given oracle access to the encryption function $\textit{Enc}_k$ for some random secret key $k$ and queries it with $q$ messages $m_1, m_2, ..., m_q$ to obtain their respective ciphertexts $c_1, c_2, ..., c_q$. The cipher $(\textit{Enc}, \textit{Dec})$ is *CPA-secure* if for any two messages $\mu_0, \mu_1$ and ciphertext $c$ belonging to either $\mu_0$ or $\mu_1$, the adversary $\textit{Eve}$ still cannot guess with probability non-negligibly greater than $\frac{1}{2}$ whether $c$ is the encryption of $\mu_0$ or $\mu_1$, i.e.

$$\Pr_{k \leftarrow \mathcal{K}, b \leftarrow \{0,1\}}[\textit{Eve}(\textit{Enc}_k(\mu_b)) = \mu_b] \le \frac{1}{2} + \textit{negl}(n)$$

As previously mentioned, the adversary has oracle access to $\textit{Enc}_k$ and can thus obtain plaintext-ciphertext pairs $(m_1, c_1), (m_2,c_2), ..., (m_q, c_q)$. They then attempt to guess whether a given ciphertext $c$ belongs to a message $\mu_0$ or $\mu_1$ (the adversary of course also knows $\mu_0$ and $\mu_1$). The word "any" in the definition entails that Eve is even free to choose $\mu_0$ and $\mu_1$ herself. The cipher is considered CPA-secure if even with all this information, the adversary cannot guess with success marginally better than $\frac{1}{2}$ if the ciphertext corresponds to $\mu_0$ or $\mu_1$.

At first glance, there appears to be something wrong with this definition. The adversary Eve is free to choose both as well as and . Therefore, it seems that this definition can be trivially broken by Eve simply by choosing to be the same as one of the previously queried messages . When Eve is presented with a ciphertext at the end, she can just check if is the ciphertext she obtained when querying - since , she will know with 100% certainty whether the ciphertext is the encryption of or . This leads to the following consequence for all CPA-secure ciphers.

There is no CPA-secure cipher with a deterministic encryption function $\textit{Enc}$.

If is probabilistic, i.e. it uses internal randomness, then the same message will produce different ciphertexts each time it is encrypted which kills the aforementioned breaking technique stone-dead. It might seem weird that the same message can produce different ciphertexts at first, but this is actually fairly easy to implement. The internal randomness used in each encryption can be encoded in the ciphertext is such a way that it can be recovered later if one knows the secret key .

This property of CPA-security means that it is a stronger notion than semantic security - every CPA-secure cipher is also semantically secure, but the opposite is not necessarily true. In fact, CPA-security is nowadays the bare minimum definition which is expected to be satisfied by a cipher in order to be considered usable, since it provides security in the case of key reuse.

Theoretical Implementation

As with many things in cryptography, pseudorandom function generators (PRFGs) come to the rescue when trying to implement a CPA-secure cipher.

This is just a proof-of-concept and the following cipher is *not* used in practice.

Suppose we have a pseudorandom function generator . The encryption function is first going to generate a random string of length . It will then seed the PRFG with the key (which also has length ) and it will pass to it. The output of the PRFG will be XOR-ed with the message. Finally, will prepend to this XOR-ed value:

fn Enc(key: str[n], message: str[l]) -> str[n + l] 
{
	let r = random_binary_string(length: n);
	return r + (xor(PRFG(key, r), message));
}

The decryption function takes the ciphertext of length and parses it as two strings - a string of length and a string of length . It then seeds the PRFG with the key and passes it the string . The output of the PRFG is then XOR-ed with to obtain the original message.

fn Dec((key: str[n], ciphertext: str[n + l])) -> str[l] 
{
	let r = ciphertext[0..n];
	let z = ciphertext[n..];
	return xor(PRFG(key, r), z);
}

Indeed, this is a valid encryption scheme - every ciphertext can only be mapped to one plaintext.

Given a key $k$, the encryption of a message $m$ is

$$\textit{Enc}_k(m) = r||(\textit{PRFG}(k, r) \oplus m)$$

Let's see the decryption of this output for the same key $k$:

$$\textit{Dec}_k(r||(\textit{PRFG}(k, r) \oplus m)) = \textit{PRFG}(k, r) \oplus (\textit{PRFG}(k, r) \oplus m) = m$$

Therefore, the validity condition is satisfied.

Moreover, this construction has a probabilistic encryption function and also turns out to be CPA-secure.

Suppose we follow the CPA model and the adversary Eve obtains the ciphertexts $c_1, c_2, ..., c_q$ of the messages $m_1, m_2, ..., m_q$. For the $i$-th encryption a random string $r_i$ of length $n$ is generated and each message is also encrypted with the same key $k$ (as per the definition of CPA-security).

Each of these strings is generated randomly, so the probability that the last string $r_q$ is the same as one of the previous random strings is $\frac{q}{2^n}$, which is negligible. 

Suppose, towards contradiction that Eve could break the CPA-security of the cipher with probability $\frac{1}{2} + \xi$ for some non-negligible $\xi$. If instead of a PRFG the encryption used a truly random function $R$, then the probability that Eve could distinguish between $\mu_0$ and $\mu_1$ would be strictly $\frac{1}{2}$ because she would simply lack any additional information. However, the encryption *does* use a PRFG and if Eve can distinguish between an encryption of $\mu_0$ and an encryption of $\mu_1$ with probability non-negligibly greater than $\frac{1}{2}$, then that means that she can distinguish between the output of a PRFG and that of a truly random function with non-neglgible advantage over $\frac{1}{2}$, which is a contradiction.